24 - Random book picker

python
Published

August 12, 2025

1 Goal

Today my goal is to write a python program that selects books that I haven’t read from my Goodreads list, at random with filtering options. It should be possible to choose publishing date, max amount of pages, min rating, date added and maybe others.

import pandas as pd
df = pd.read_csv('data/day24/goodreads_library_export.csv')
df.head(5)
Book Id Title Author Author l-f Additional Authors ISBN ISBN13 My Rating Average Rating Publisher ... Date Read Date Added Bookshelves Bookshelves with positions Exclusive Shelf My Review Spoiler Private Notes Read Count Owned Copies
0 51648276 Drive Your Plow Over the Bones of the Dead Olga Tokarczuk Tokarczuk, Olga Antonia Lloyd-Jones, Beata Poźniak ="" ="" 0 3.94 Penguin Audio ... NaN 2023/11/08 NaN NaN read NaN NaN NaN 1 0
1 18112493 Parissyndromet Heidi Furre Furre, Heidi NaN ="8282880035" ="9788282880039" 0 4.12 Flamme ... NaN 2024/12/21 NaN NaN read NaN NaN NaN 1 0
2 25489025 The Vegetarian Han Kang Kang, Han Deborah Smith ="0553448188" ="9780553448184" 0 3.64 Hogarth ... NaN 2024/12/21 NaN NaN read NaN NaN NaN 1 0
3 28921 The Remains of the Day Kazuo Ishiguro Ishiguro, Kazuo NaN ="" ="" 0 4.14 Faber & Faber ... NaN 2025/07/15 NaN NaN read NaN NaN NaN 1 0
4 43868109 Empire of Pain: The Secret History of the Sack... Patrick Radden Keefe Keefe, Patrick Radden NaN ="0385545681" ="9780385545686" 0 4.54 Doubleday ... NaN 2025/07/10 to-read to-read (#298) to-read NaN NaN NaN 0 0

5 rows × 24 columns

2 Data Cleaning

First, I need to clean the data a little and remove unwanted columns and rows

# Remove read books
to_read = df[df['Read Count'] == 0]
to_read
Book Id Title Author Author l-f Additional Authors ISBN ISBN13 My Rating Average Rating Publisher ... Date Read Date Added Bookshelves Bookshelves with positions Exclusive Shelf My Review Spoiler Private Notes Read Count Owned Copies
4 43868109 Empire of Pain: The Secret History of the Sack... Patrick Radden Keefe Keefe, Patrick Radden NaN ="0385545681" ="9780385545686" 0 4.54 Doubleday ... NaN 2025/07/10 to-read to-read (#298) to-read NaN NaN NaN 0 0
5 40163119 Say Nothing: A True Story of Murder and Memory... Patrick Radden Keefe Keefe, Patrick Radden NaN ="0385521316" ="9780385521314" 0 4.47 Doubleday ... NaN 2025/07/10 to-read to-read (#297) to-read NaN NaN NaN 0 0
6 42683 On Writing Ernest Hemingway Hemingway, Ernest Larry W. Phillips, Charles Scribner Jr. ="0684854295" ="9780684854298" 0 4.02 Scribner ... NaN 2025/06/12 to-read to-read (#296) to-read NaN NaN NaN 0 0
7 22816087 Seveneves Neal Stephenson Stephenson, Neal NaN ="" ="" 0 4.00 William Morrow ... NaN 2025/06/11 to-read to-read (#295) to-read NaN NaN NaN 0 0
8 50365 A Suitable Boy (A Bridge of Leaves, #1) Vikram Seth Seth, Vikram NaN ="0060786523" ="9780060786526" 0 4.11 Harper Perennial Modern Classics ... NaN 2025/06/11 to-read to-read (#294) to-read NaN NaN NaN 0 0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
391 28815 Influence: The Psychology of Persuasion Robert B. Cialdini Cialdini, Robert B. NaN ="006124189X" ="9780061241895" 0 4.22 Harper Business ... NaN 2018/08/27 to-read to-read (#5) to-read NaN NaN NaN 0 0
394 2255 Way of the Peaceful Warrior: A Book That Chang... Dan Millman Millman, Dan NaN ="1932073205" ="9781932073201" 0 4.13 HJ Kramer ... NaN 2018/08/27 to-read to-read (#4) to-read NaN NaN NaN 0 0
396 19795 Power vs. Force: The Hidden Determinants of Hu... David R. Hawkins Hawkins, David R. NaN ="1561709336" ="9781561709335" 0 4.15 Hay House ... NaN 2018/08/21 to-read to-read (#3) to-read NaN NaN NaN 0 0
404 566259 Fire in the Belly: On Being a Man Sam Keen Keen, Sam NaN ="0553351370" ="9780553351378" 0 3.81 Bantam ... NaN 2018/08/21 to-read to-read (#2) to-read NaN NaN NaN 0 0
405 1052 The Richest Man in Babylon George S. Clason Clason, George S. NaN ="0451205367" ="9780451205360" 0 4.23 Berkley Books ... NaN 2018/08/21 to-read to-read (#1) to-read NaN NaN NaN 0 0

308 rows × 24 columns

df.columns
Index(['Book Id', 'Title', 'Author', 'Author l-f', 'Additional Authors',
       'ISBN', 'ISBN13', 'My Rating', 'Average Rating', 'Publisher', 'Binding',
       'Number of Pages', 'Year Published', 'Original Publication Year',
       'Date Read', 'Date Added', 'Bookshelves', 'Bookshelves with positions',
       'Exclusive Shelf', 'My Review', 'Spoiler', 'Private Notes',
       'Read Count', 'Owned Copies'],
      dtype='object')
# Columns that I want to keep
columns = ['Title', 'Author', 'Average Rating', 'Publisher',
       'Number of Pages', 'Original Publication Year', 'Date Added']

to_read = to_read[columns]
to_read.head(5)
Title Author Average Rating Publisher Number of Pages Original Publication Year Date Added
4 Empire of Pain: The Secret History of the Sack... Patrick Radden Keefe 4.54 Doubleday 535.0 2021.0 2025/07/10
5 Say Nothing: A True Story of Murder and Memory... Patrick Radden Keefe 4.47 Doubleday 441.0 2018.0 2025/07/10
6 On Writing Ernest Hemingway 4.02 Scribner 160.0 1984.0 2025/06/12
7 Seveneves Neal Stephenson 4.00 William Morrow 872.0 2015.0 2025/06/11
8 A Suitable Boy (A Bridge of Leaves, #1) Vikram Seth 4.11 Harper Perennial Modern Classics 1474.0 1993.0 2025/06/11
# Remove NaN values
to_read = to_read.dropna()
to_read.head(5)
Title Author Average Rating Publisher Number of Pages Original Publication Year Date Added
4 Empire of Pain: The Secret History of the Sack... Patrick Radden Keefe 4.54 Doubleday 535.0 2021.0 2025/07/10
5 Say Nothing: A True Story of Murder and Memory... Patrick Radden Keefe 4.47 Doubleday 441.0 2018.0 2025/07/10
6 On Writing Ernest Hemingway 4.02 Scribner 160.0 1984.0 2025/06/12
7 Seveneves Neal Stephenson 4.00 William Morrow 872.0 2015.0 2025/06/11
8 A Suitable Boy (A Bridge of Leaves, #1) Vikram Seth 4.11 Harper Perennial Modern Classics 1474.0 1993.0 2025/06/11

I notice that some of the columns are type float, I want them to be integers instead

to_read.dtypes
Title                         object
Author                        object
Average Rating               float64
Publisher                     object
Number of Pages              float64
Original Publication Year    float64
Date Added                    object
dtype: object
to_read = to_read.astype({'Number of Pages': int, 'Original Publication Year': int})
to_read['Date Added'] = pd.to_datetime(to_read['Date Added'])

3 Creating random book picker function

import datetime
import random
def random_book(df, options: int = 1, title: str = None, author: str = None, min_rating: float = 0, publisher: str = None, min_year: int = None, max_year: int = None, added_year: int = None, added_month: int = None): 
    if title is not None:
        df = df.loc[df['Title'].str.contains(title, case=False)]
        
    if author is not None:
        df = df.loc[df['Author'].str.contains(author, case=False)]
        if df.empty == True:
            print("You haven't saved any books that you want to read by that author")
            return
        
    if min_rating is not None and min_rating >= df['Average Rating'].min():
        df = df.loc[df['Average Rating'] >= min_rating]
        
    if publisher is not None:
        df = df.loc[df['Publisher'].str.contains(publisher)]
        
    if min_year is not None:
        if min_year < df['Original Publication Year'].min():
            min_year = df['Original Publication Year'].min()
        df = df.loc[df['Original Publication Year'] >= min_year]
        
    if max_year is not None:
        if max_year > df['Original Publication Year'].max():
            max_year = df['Original Publication Year'].max()
        df = df.loc[df['Original Publication Year'] <= max_year]
        
    if added_year is not None and (added_year < df['Date Added'].dt.year.min() or added_year > df['Date Added'].dt.year.max()):
        df = df.loc[df['Date Added'].dt.year == added_year]
        
    if added_month is not None:
        if (added_month > 12 or added_month < 1):
            print('Month out of range, choose a number between 1 and 12')
            return
        df = df.loc[df['Date Added'].dt.month == added_month]

    # Pick a book for the number of choices wanted
    books = []
    for i in range(options):
        books.append(random.randint(0, len(df)-1))
        
    return df.iloc[books]
       

4 Testing

random_book(to_read, added_month=2030)
Month out of range, choose a number between 1 and 12
random_book(to_read, title='japan')
Title Author Average Rating Publisher Number of Pages Original Publication Year Date Added
50 Bushido: The Soul of Japan Inazō Nitobe 3.84 Kodansha USA 160 1899 2024-04-21
random_book(to_read, min_year=1800, max_year=1940)
Title Author Average Rating Publisher Number of Pages Original Publication Year Date Added
374 The Brothers Karamazov Fyodor Dostoevsky 4.39 Farrar, Straus and Giroux 796 1880 2018-11-10
random_book(to_read, min_rating=4.1)
Title Author Average Rating Publisher Number of Pages Original Publication Year Date Added
64 My Traitor's Heart: A South African Exile Retu... Rian Malan 4.25 Grove Press 349 1990 2023-04-05
random_book(to_read, author='Murakami')
You haven't saved any books that you want to read by that author
random_book(to_read, options=3)
Title Author Average Rating Publisher Number of Pages Original Publication Year Date Added
32 Utz Bruce Chatwin 3.67 Penguin Publishing Group 154 1988 2024-12-29
321 Nine Chains to the Moon R. Buckminster Fuller 3.85 Southern Illinois University Press 384 1963 2020-10-19
179 Swann’s Way (In Search of Lost Time, #1) Marcel Proust 4.16 Penguin Classics 468 1913 2021-09-10

5 Conclusion

There we have it, a simple random book picker.

It however isn’t optimized for speed as I repeatedly re-assign the DataFrame instead of saving all the filters and then using the saved filtered in one filter operation for the dataframe.

Also the amount of parameters is high for the function, could be an option to use *arg and **kwargs instead.

Would additionally have been better if there was an API for one’s own Goodreads library, then I wouldn’t have to download a csv file when new books are added. This was however just a for-fun coding task.

Also I’m lacking a ‘genre’ column, which would be nice to use to filter books by.